93 research outputs found
PCA and K-Means decipher genome
In this paper, we aim to give a tutorial for undergraduate students studying
statistical methods and/or bioinformatics. The students will learn how data
visualization can help in genomic sequence analysis. Students start with a
fragment of genetic text of a bacterial genome and analyze its structure. By
means of principal component analysis they ``discover'' that the information in
the genome is encoded by non-overlapping triplets. Next, they learn how to find
gene positions. This exercise on PCA and K-Means clustering enables active
study of the basic bioinformatics notions. Appendix 1 contains program listings
that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet
usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich
ones. Animated 3D PCA plots are attached as separate gif files. Topology
(cluster structure) and geometry (mutual positions of clusters) of these plots
depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes
and additional animated 3D PCA plot
PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes
Multidimensional data distributions can have complex topologies and variable
local dimensions. To approximate complex data, we propose a new type of
low-dimensional ``principal object'': a principal cubic complex. This complex
is a generalization of linear and non-linear principal manifolds and includes
them as a particular case. To construct such an object, we combine a method of
topological grammars with the minimization of an elastic energy defined for its
embedment into multidimensional data space. The whole complex is presented as a
system of nodes and springs and as a product of one-dimensional continua
(represented by graphs), and the grammars describe how these continua transform
during the process of optimal complex construction. The simplest case of a
topological grammar (``add a node'', ``bisect an edge'') is equivalent to the
construction of ``principal trees'', an object useful in many practical
applications. We demonstrate how it can be applied to the analysis of bacterial
genomes and for visualization of cDNA microarray data using the ``metro map''
representation. The preprint is supplemented by animation: ``How the
topological grammar constructs branching principal components
(AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure
Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization
Principal manifolds are defined as lines or surfaces passing through ``the
middle'' of data distribution. Linear principal manifolds (Principal Components
Analysis) are routinely used for dimension reduction, noise filtering and data
visualization. Recently, methods for constructing non-linear principal
manifolds were proposed, including our elastic maps approach which is based on
a physical analogy with elastic membranes. We have developed a general
geometric framework for constructing ``principal objects'' of various
dimensions and topologies with the simplest quadratic form of the smoothness
penalty which allows very effective parallel implementations. Our approach is
implemented in three programming languages (C++, Java and Delphi) with two
graphical user interfaces (VidaExpert
http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa
http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we
overview the method of elastic maps and present in detail one of its major
applications: the visualization of microarray data in bioinformatics. We show
that the method of elastic maps outperforms linear PCA in terms of data
approximation, representation of between-point distance structure, preservation
of local point neighborhood and representing point classes in low-dimensional
spaces.Comment: 35 pages 10 figure
Astrocytes organize associative memory
We investigate one aspect of the functional role played by astrocytes in neuron-astrocyte networks present in the mammal brain. To highlight the effect of neuron-astrocyte interaction, we consider simplified networks with bidirectional neuron-astrocyte communication and without any connections between neurons. We show that the fact, that astrocyte covers several neurons and a different time scale of calcium events in astrocyte, alone can lead to the appearance of neural associative memory. Without any doubt, this mechanism makes the neuron networks more flexible to learning, and, hence, may contribute to the explanation, why astrocytes have been evolutionary needed for the development of the mammal brain
Modeling Working Memory in a Spiking Neuron Network Accompanied by Astrocytes
We propose a novel biologically plausible computational model of working memory (WM) implemented by a spiking neuron network (SNN) interacting with a network of astrocytes. The SNN is modeled by synaptically coupled Izhikevich neurons with a non-specific architecture connection topology. Astrocytes generating calcium signals are connected by local gap junction diffusive couplings and interact with neurons via chemicals diffused in the extracellular space. Calcium elevations occur in response to the increased concentration of the neurotransmitter released by spiking neurons when a group of them fire coherently. In turn, gliotransmitters are released by activated astrocytes modulating the strength of the synaptic connections in the corresponding neuronal group. Input information is encoded as two-dimensional patterns of short applied current pulses stimulating neurons. The output is taken from frequencies of transient discharges of corresponding neurons. We show how a set of information patterns with quite significant overlapping areas can be uploaded into the neuron-astrocyte network and stored for several seconds. Information retrieval is organized by the application of a cue pattern representing one from the memory set distorted by noise. We found that successful retrieval with the level of the correlation between the recalled pattern and ideal pattern exceeding 90% is possible for the multi-item WM task. Having analyzed the dynamical mechanism of WM formation, we discovered that astrocytes operating at a time scale of a dozen of seconds can successfully store traces of neuronal activations corresponding to information patterns. In the retrieval stage, the astrocytic network selectively modulates synaptic connections in the SNN leading to successful recall. Information and dynamical characteristics of the proposed WM model agrees with classical concepts and other WM models
Robust simplifications of multiscale biochemical networks
<p>Abstract</p> <p>Background</p> <p>Cellular processes such as metabolism, decision making in development and differentiation, signalling, etc., can be modeled as large networks of biochemical reactions. In order to understand the functioning of these systems, there is a strong need for general model reduction techniques allowing to simplify models without loosing their main properties. In systems biology we also need to compare models or to couple them as parts of larger models. In these situations reduction to a common level of complexity is needed.</p> <p>Results</p> <p>We propose a systematic treatment of model reduction of multiscale biochemical networks. First, we consider linear kinetic models, which appear as "pseudo-monomolecular" subsystems of multiscale nonlinear reaction networks. For such linear models, we propose a reduction algorithm which is based on a generalized theory of the limiting step that we have developed in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Second, for non-linear systems we develop an algorithm based on dominant solutions of quasi-stationarity equations. For oscillating systems, quasi-stationarity and averaging are combined to eliminate time scales much faster and much slower than the period of the oscillations. In all cases, we obtain robust simplifications and also identify the critical parameters of the model. The methods are demonstrated for simple examples and for a more complex model of NF-<it>Îş</it>B pathway.</p> <p>Conclusion</p> <p>Our approach allows critical parameter identification and produces hierarchies of models. Hierarchical modeling is important in "middle-out" approaches when there is need to zoom in and out several levels of complexity. Critical parameter identification is an important issue in systems biology with potential applications to biological control and therapeutics. Our approach also deals naturally with the presence of multiple time scales, which is a general property of systems biology models.</p
Improving Randomized Learning of Feedforward Neural Networks by Appropriate Generation of Random Parameters
In this work, a method of random parameters generation for randomized
learning of a single-hidden-layer feedforward neural network is proposed. The
method firstly, randomly selects the slope angles of the hidden neurons
activation functions from an interval adjusted to the target function, then
randomly rotates the activation functions, and finally distributes them across
the input space. For complex target functions the proposed method gives better
results than the approach commonly used in practice, where the random
parameters are selected from the fixed interval. This is because it introduces
the steepest fragments of the activation functions into the input hypercube,
avoiding their saturation fragments
A Linear Algebra Approach for Detecting Binomiality of Steady State Ideals of Reversible Chemical Reaction Networks
Motivated by problems from Chemical Reaction Network Theory, we investigate
whether steady state ideals of reversible reaction networks are generated by
binomials. We take an algebraic approach considering, besides concentrations of
species, also rate constants as indeterminates. This leads us to the concept of
unconditional binomiality, meaning binomiality for all values of the rate
constants. This concept is different from conditional binomiality that applies
when rate constant values or relations among rate constants are given. We start
by representing the generators of a steady state ideal as sums of binomials,
which yields a corresponding coefficient matrix. On these grounds we propose an
efficient algorithm for detecting unconditional binomiality. That algorithm
uses exclusively elementary column and row operations on the coefficient
matrix. We prove asymptotic worst case upper bounds on the time complexity of
our algorithm. Furthermore, we experimentally compare its performance with
other existing methods
Validation of nonlinear PCA
Linear principal component analysis (PCA) can be extended to a nonlinear PCA
by using artificial neural networks. But the benefit of curved components
requires a careful control of the model complexity. Moreover, standard
techniques for model selection, including cross-validation and more generally
the use of an independent test set, fail when applied to nonlinear PCA because
of its inherent unsupervised characteristics. This paper presents a new
approach for validating the complexity of nonlinear PCA models by using the
error in missing data estimation as a criterion for model selection. It is
motivated by the idea that only the model of optimal complexity is able to
predict missing values with the highest accuracy. While standard test set
validation usually favours over-fitted nonlinear PCA models, the proposed model
validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure
Deriving effective models for multiscale systems via evolutionary -convergence
We discuss possible extensions of the recently established theory of evolutionary Gamma convergence for gradient systems to nonlinear dynamical systems obtained by perturbation of a gradient systems. Thus, it is possible to derive effective equations for pattern forming systems with multiple scales. Our applications include homogenization of reaction-diffusion systems, the justification of amplitude equations for Turing instabilities, and the limit from pure diffusion to reaction-diffusion. This is achieved by generalizing the Gamma-limit approaches based on the energy-dissipation principle or the evolutionary variational estimate
- …